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ABSTRACT 

In this paper, for formulating rules, we are using modified BPSO based on mutation function. Association 
rules are generated without specifying minimum support and confidence which improves the drawback of apriori, 
Fp-growth algorithm. In proposed method, the problem of convergence in BPSO is improved using mutation part of 
Genetic Algorithm and also comparing the results of BPSO and modified BPSO. 
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INTRODUCTION 

Association Rule Mining [1, 2] is a technique in Data Mining that is used to reveal the hidden correlation 
among the different items of transactions exhibit in the database. An association rule can be described as any rule 
that involves association relationship among different objects (or itemset) such as an object implies to another or 
the events of these articles alone or with other objects. Association rules [1,2] are, in general, if-then rules that 
work on some conditional probability. The two main parameters used for such conditions are support and 
confidence. The support can be concocted of as the percentage that all the items in the rules will satisfy. 
The confidence then again can be characterized as the degree of certainty that an association Let in a database D 
there are a number of transactions T. In each transaction there is number of items having a place with itemset I. If n 
is the distinct number of items in D then 1 = {il, i2...in] is a set of all the items present in database. Also any 
transaction t £ T may contain variable set of items over I, i.e., ii, ij, ik c I. Each transaction is associated with an 
interesting identifier. T_ID. The association rule is of the shape of X=>Y, where X, Y c I and X Y = 0, where X is 
the consequent of the rule. 

The association X=>Y holds for any transaction T in D if its bolster S of any item is satisfied. Support s of 
an association rule R is the percentage of transaction t that contains XUY (both X and Y) which is the probability 
P(XUY) of the items in transaction. 

Support (X=>Y) = sup(R) = P(XUY). 

The association rule R, of the form X=>Y has confidence C in transaction set T in D if the contingent 
likelihood fulfills, i.e., the transaction t containing X also contains Y. It is taken as P(Y/X). 

Confidence(c) = confidence (X=>Y) = conf(R) = P(Y/X) = support_count(XUY)/support_count(X) 
= sup(R)/sup(X). 

An example of an association rule is as follows, 
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Cheese -> beer [sup = 10%, conf = 80%] 

This rule says that 10% of clients purchase cheese and beer together, and those who buy cheese additionally 
purchase beer 80% of the time [3], [4]. 

Frequent Pattern Mining 

Approximate FPM which objectives to find out thrilling and generalized know-how in noisy databases, has 
received increasingly more pursuits in latest years. The approximate frequent patterns have shown their extreme 
advantages in various domains, such as discovering approximate association rules[5], reconstructing noisy databases [6-8], 
and clustering/classification[9,10]. 

Data tend to be diverse and dirty, all things considered, applications, which may be caused by different factors 
such as noise, imprecise measurements, network latency and sampling errors. When mining interesting knowledge from 
these applications, traditional frequent pattern mining(FPM) approaches, such as Apriori[ll], FP-growth[12] and Eclat[13] 
algorithms, are confronted with huge difficulties. 

Challenges in Approximate FPM First and for most, the effective FPM algorithms are developed on the basis of 
anti-monotonicity property, which is feasible for the algorithms to prune applicant patterns and slender the search space. 
However, the anti-monotonicity property is not accessible in the vast majority of the uproarious available in by far most of 
the uproarious environments. Therefore, the approximate FPM algorithms have to resort to heuristics-based methods to 
prune search space, which provides no guarantee on the completeness of the search and only imprecise mining results are 
obtained. 

Moreover, with violation of anti-monotonicity property, the mining of approximate frequent patterns poses new 
challenges in itemsets’ generation. In traditional cases, all nonempty subsets of a regular itemset are also frequent, which is 
the fundamental to any depth-first approach and leads to the success of FPM algorithms. However, in the field of 
approximate frequent pattern mining, this property does not hold on candidate generation, which means one cannot obtain 
the support set of an AFP directly from its sub-patterns. Therefore, multiple scans on the original database are expected to 
register the support of every itemset. Thus the time unpredictability of the relevant breadth-first algorithms is exponential 
in the maximum quantity of potential itemsets. Furthermore, the support computation of a candidate itemset has ended up 
being NP-hard even in the case that only fixed number of error is tolerated in each item [14]. 

In addition, with missing items in databases, large frequent patterns are broken into short fragments with low 
support counts, so that the original “true” frequent patterns can't be recuperated by the traditional FPM algorithms. Thus 
only the fragments are obtained, which are less interesting or informative than the original “true” ones. Consequently, AFP 
mining algorithms are approved for with the aim of recovering the original embedded true patterns, but fall into a new 
dilemma. That is, unlike the traditional FPM algorithms to achieve exact frequent patterns, AFP mining approaches are 
inclined to get the approximatively correct results with false positive errors or false negative errors. If it isn't always 
handled with caution, unreliable or even incorrect mining results could be obtained [15]. 

Sequential Pattern Mining 

PM proposed by way of Agrawal[16] on analyzing big data from supermarket, is an vital branch of data mining. 
SPM, crucial branch of data mining. Consecutive example mining, popular in web get to pattern analysis, market basket 
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analysis, fault detection in network, DNA sequences etc, which needs to find All the sequential pattern that surpasses the 
base support threshold [17], Conventional algorithm on sequential pattern mining are classified categories: successive 
example that outperforms the base support threshold[17]. Traditional calculation on SPM are characterized classes: 
Apriori, GSP, projection and SPADE [18], 

Apriori use codes generating-testing methods and is simple and easy to implement. However, Apriori generates a 
massive amount of items-sets and scans the database frequently, for that reason wastes a massive amount of time GSP [19] 
in view of the frequency-item mining algorithm of Apriori and uses time limitations, sliding window to improve the 
efficiency while it needs traverse the database multiple times. SPADE[20] by Zaki transforms the data into a vertical 
form,but generates masses of items-sets. Generating item-sets and branch trimming consumes extraordinary measure of 
time. Based on projection, Freespan[21] and Prefix span[22] use “divide-conquer” to divide the raw database into smaller 
projection databases, and then mine the sequential pattern in smaller databases. Divide conquer builds the productivity and 
has excellent expansion. However, this method spends incredible measure of time in dividing database into projection 
databases and has the bottle neck in constructing projection databases and scanning data [23]. 

Hierarchical Database 

Another sort of database is emerging both in the research community and in the commercial market place. This 
new sort of database permits designers to speak to hierarchical data in XML form while providing query, transaction and 
security services similar to commercial relational database software. (In fact, hierarchical databases are not new; some 
earlier databases used hierarchical data models such as CODASYL. The hierarchical version is taking part in a rebirth with 
the arrival of XML [24]). While it will most likely not replace all relational databases, it is being aimed at just the sort of 
problem we have defined in implementing the NMP missions and technology database. The database canonically 
implements the data hierarchy. A hierarchy, in this case, can be thought of as a tree structure. An example of such a 
structure that is natural to the majority of the community structure that is recognizable to the greater some portion of the 
group is the file system directory, as seen in the Windows or Macintosh stack of folders metaphor. Here, parent or 
higher-level folders may contain child, or lower-level folders, which, in turn, contain folders of yet a lower level. Every 
organizer may likewise contain a specific type of data or file. If used as intended (hut not enforced), "child folders", i.e., 
those contained within their "parent folder" contain data that is a subset of the types of data contained in the parent folder. 
In addition, pointers (aliases or "shortcuts") are provided to allow linking logically connected, but non-adjacent, folders. In 
most cases, the folders are displayed, not as a tree, but as an indentured list. While the indentured list is a convenient and 
space-efficient format, it does, unfortunately, tend to bide the tree structure. Still it is a familiar construct with which most 
individuals are familiar, and for which the underlying structure is readily grasped. 

The upsides of a hierarchical database are basically the inverse of the drawbacks of the relational database, 
i.e. With a hierarchical database, the hierarchy is the native structure. There is no need to craft custom interfaces to hide the 
actual database structure or to interpret it for the user. Hierarchical data is stored in a hierarchical format (XML). A simple 
display interface permits the client guide access to the structure as implemented and the data as stored. System 
maintenance and debug efforts are much reduced. In this business, surprises are not good, and this ability to view the 
structures as they are tends to minimize surprises [25]. 
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Literature Survey 


Table 1: Litterature Review 


S. No 

Author Name 

Algorithm 

Proposed Work 

1 . 

Ruilin Liu [2016] et. 
al [26] 

SLAM 

algorithm 

An efficient rare association rule mining algorithm 
called spark-based rare association rule mining 
(SRAM) which leverages not only the efficiency of 
FP-growth algorithm but also the powerful big data 
processing mechanism of spark platform. We have 
implemented our algorithm on the start platform and 
tested with various of data sets. 

2. 

Morteza Zihayat 
[2016] et. al [27] 

BigHUSP 

Another structure for mining HUSPs in tremendous 
data. A dispensed and parallel algorithm referred to 
as Big HUSP is proposed to discover HUSPs 
efficiently. At its heart. Big HUSP makes use of 
multiple Map Reduce-like steps to process 
information in parallel. We also propose some of 
pruning techniques to reduce seek area in disbursed 
surroundings, and consequently decrease 
computational and communique charges, whilst 
nevertheless preserving correctness. 

3. 

Mohammad Karim 
Sohrabi [2016] et. al 
[28] 

CUSE 

algorithm 

a novel bit astute way to deal with pack and speak to 
the sequence database as a 3-dimentional array and 
use a corresponding mining method to extract 
frequent sequences from the compressed structure 
Experimental results and overall performance 
observe display that this calculation beats the best 
formerly evolved algorithms. 

4. 

Nicolle Chaves 
Cysneiros [2016] et. 
al [29] 


A solution in which an ontology’s reasoner is used to 
retrieve the subclasses of the authentic queried 
concepts. These retrieved concepts are used to 
rewrite the submitted query 

5. 

Aneesh K. Sahu 
[2015] et. al [29] 

Cryptography 

algorithm 

The proposed replica capably to find global frequent 
item sets even when no site can be treated as trusted. 
The trusted party initiates the process and prepares 
thmerged list. 

6. 

Hong-Yi Chang 
[2015] et. al [31] 

Apriori 
algorithm and 
FP-Growth 
algorithm 

We developed a method that combines the Apriori 
and FP -Growth algorithms with MapReduce to 
rectify this problem. In experiments carried out, we 
varied the block length of the Mapper to obtain 
execution performance higher than the ones of the 
Apriori and FP -Growth algorithms 

7. 

Masome sadat 

Hoseini [2015] et. al 
[32] 

FP-growth 

algorithm 

A new approach is presented for mining Cantree, and 
it’s evaluated to reveal its development over the FP- 
growth technique that mine FP tree. 


Problem Statement 

The existing technique generates association rules by binary particle swarm optimization, which has the low 
convergence problem. Due to this problem the execution time of algorithm increase and also the association rules 
generated are more which requires more memory for storage. 

Proposed Work and Result Analysis 

In the proposed methodology, binary particle swarm optimization is used with mutation function, through which 
the low convergence problem of BPSO can become less. It can be done by applying a suitable mutation rate. The combined 
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approach of BPSO with mutation function generates rule with less execution time and has no convergence problem. 


Table 2: Base Results (Elapsed Time Is 2.078145 Seconds) 


Rule Number Antecedent 

Consequent 

Support Confidence 

Herring Heineken, Corned-B 

85.00 

85.00 

Herring Soda 

75.00 

75.00 

Soda Heineken, Avacado 

65.00 

86.67 

P. Frames Avacado 

45.00 

90.00 

Soda, Artichoke Heineken, Baugette 

45.00 

90.00 

Baugette, Olives Avacado 

93.33 

93.33 

Cracker, Baugette P. Frames, Turkey 

15.00 

33.33 

Cracker, Heineken, Avacado, BaugetteSoda, 

Herring 

35.00 

100.00 


Above table shows the association rules of the form: Antecedents consequent(Support %, Confidence %) 

These rules are generated by applying binary particle swarm optimization. The Elapsed time of base algorithm 
when executed on MATLAB is 2.078145 seconds. 


Table 3: Results With Proposed Experiment (Elapsed Time Is 3.634799 Seconds) 


Rule Number Antecedent 

Consequent 

Support 

Confidence 

Avacado Cracker 

45.00 

56.25 

Soda Avacado,Baugette 

70.00 

93.33 

AvacadoCracker,Turkey 

45.00 

56.25 

Soda,P. Frames Herring 

45.00 

100.00 


Above table shows the association rules of the form: Antecedents consequent(Support %, Confidence %) 

These rules are generated by applying binary particle swarm optimization with mutation operator. The Elapsed 
time of proposed algorithm when executed on MATLAB is 1.246828 seconds. 



Figure 1: Simulation 
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Figure 2: Simulation Result 


The above figure shows outcome of program execution. 
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Figure 1: Rule Comparison Figure 2: Elapsed Time Comparison 


The above rule comparison shows association rules that are generated by base algorithm are more and rules 
generated by proposed approach are less. 

The above elapsed time comparison is showing that our proposed approach is executed in less time as compared 
to the base values. 

CONCLUSIONS 

In this paper, rules are formed by using BPSO merged with mutation, which doesnot generate redundant rules and 
also improves the convergence problem of BPSO. The comparison of elapsed time and number of rules is also given. Our 
proposed methodology generates less number of rules with no redundancy and less number of elapsed time in comparison 
with other algorithm. 
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